Automatic Synthesis of Bioconductor Pipelines: A Domain Modeling Challenge

نویسندگان

  • Anna-Lena Lamprecht
  • Tiziana Margaria
چکیده

GNU R is a widely used programming language and software environment for statistical data analysis and visualization. Bioconductor [1] is a collection of bioinformatics packages that extends R’s standard range of functionality by comprehensive libraries of functions and meta-data predominantly for the analysis of data from high-throughput genomics and molecular biology experiments, and additionally provides several example data sets that are useful for testing, benchmarking and demonstration purposes. Reference manuals and additional manuscripts provided with the packages at the Bioconductor web site describe a variety of data analysis procedures based on the available functionality. The described bioinformatics procedures or workflows are typically referred to as data analysis pipelines, as they typically have a simple, in fact mostly linear, structure. (This is largely due to the fact that the often considerable complexity of the individual analysis steps is encapsulated by a services with a simple interface, and hence hidden from the user at the provided level of abstraction.) Thus, automatic workflow composition functionality like the method described in [6], which is based on a linear-time logic synthesis algorithm, can be easily applied here to generate the complete analysis pipelines automatically. Making use of the constraint-driven workflow composition functionality of the PROPHETS framework [5], which is based on such a linear-time logic synthesis algorithm, we created a prototype of a corresponding synthesis framework [3]. Focusing on DNA microarray analysis, this prototype comprises around 30 services and provides a framework for user-level construction of analysis pipelines based on appropriately wrapped and integrated Bioconductor functionality. It helps handling the variability that is inherent in microarray data analysis at an useraccessible level and emphasizes the agility of a model-driven and service-oriented approach to workflow design. PROPHETS is the current reference implementation of the loose programming paradigm [4], aiming to simplify workflow development in order to reach application experts without programming background. Working with PROPHETS consists of two major phases:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Generation of a Multi Agent System for Crisis Management by a Model Driven Approach

Considering the increasing occurrences of unexpected events and the need for pre-crisis planning in order to reduce risks and losses, modeling instant response environments is needed more than ever. Modeling may lead to more careful planning for crisis-response operations, such as team formation, task assignment, and doing the task by teams. A common challenge in this way is that the model shou...

متن کامل

Providing a structural model for psychological problems based on disconnection and rejection domain and negative automatic thoughts with mediating role of experimental avoidance

Introduction: Psychological problems are the result of a person's interaction with the environment and include behaviors that cause social conflicts, dissatisfaction and individual unhappiness. The present study aimed to provide a structural model for psychological problems based on disconnection and rejection domain and negative automatic thoughts with mediating role of experimental avoidance....

متن کامل

arrayQualityMetrics—a bioconductor package for quality assessment of microarray data

SUMMARY The assessment of data quality is a major concern in microarray analysis. arrayQualityMetrics is a Bioconductor package that provides a report with diagnostic plots for one or two colour microarray data. The quality metrics assess reproducibility, identify apparent outlier arrays and compute measures of signal-to-noise ratio. The tool handles most current microarray technologies and is ...

متن کامل

KEGGgraph: Application Examples

In this vignette, we demonstrate the application of KEGGgraph as flexible module in analysis pipelines targeting heterogenous biological questions. For basic use of the KEGGgraph package, please refer to the vignette KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor.

متن کامل

OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis

Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015